Spell Checking in Spanish: The Case of Diacritic Accents
نویسندگان
چکیده
This article presents the problem of diacritic restoration (or diacritization) in the context of spell-checking, with the focus on an orthographically rich language such as Spanish. We argue that despite the large volume of work published on the topic of diacritization, currently available spell-checking tools have still not found a proper solution to the problem in those cases where both forms of a word are listed in the checker’s dictionary. This is the case, for instance, when a word form exists with and without diacritics, such as continuo ‘continuous’ and continuó ‘he/she/it continued’, or when different diacritics make other word distinctions, as in continúo ‘I continue’. We propose a very simple solution based on a word bigram model derived from correctly typed Spanish texts and evaluate the ability of this model to restore diacritics in artificial as well as real errors. The case of diacritics is only meant to be an example of the possible applications for this idea, yet we believe that the same method could be applied to other kinds of orthographic or even grammatical errors. Moreover, given that no explicit linguistic knowledge is required, the proposed model can be used with other languages provided that a large normative corpus is available.
منابع مشابه
Grepator: Accents & Case Mix for Thesaurus
There is a real need among researchers and students for pedagogical resources. In France, information retrieval techniques have been developed, for example in the Doc'CISMeF web site. As Pubmed, documents are indexed with (French) MeSH terms, one of the problems discovered, in quality studies, is the inadequacies between the user requests and the MeSH controlled vocabulary. Moreover, French (bu...
متن کاملBuilding ancient Spanish dictionaries for spell-checking of DL texts
Being aware of the usefulness of spell-checkers on the correction of modern works, and lacking this facility for ancient texts, we decided to build dictionaries for ancient Spanish. This decision led to new problems and new questions. We have built a time-aware system of dictionaries that takes into account the temporal dynamics of language, to help solve the problem of ancient Spanish spell-ch...
متن کاملRule-Based Spanish Morphological Analyzer Built From Spell Checking Lexicon
Preprocessing tools for automated text analysis have become more widely available in major languages, but non-English tools are often still limited in their functionality. When working with Spanishlanguage text, researchers can easily find tools for tokenization and stemming, but may not have the means to extract more complex word features like verb tense or mood. Yet Spanish is a morphological...
متن کاملA contrastive study of Catalan and Spanish declarative intonation: Focus on Majorcan dialects
The goal of the present paper is to identify some of the differences in the intonation of Catalan and Spanish as spoken in Majorca. The tonal features we investigated were: (1) utterance-final pitch accents in broad focus declaratives, and (2) local contrastive focus pitch accents. Previous research, mostly on related varieties, such as Central Catalan and Castilian Spanish, had indirectly sugg...
متن کاملAn extended spell checker for unknown words
Spell checking is considered a solved problem, but with the rapid development of the natural language processing the new results are slowly extending the means of spell checking towards grammar checking. In this article I review some of the spell checking error classes in a broader sense, the related problems, their state-of-the-art solutions and their different nature on different types of lan...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012